Scalable systems for controlling color management comprising varying levels of metadata

ABSTRACT

Several embodiments of scalable image processing systems and methods are disclosed herein whereby color management processing of source image data to be displayed on a target display is changed according to varying levels of metadata.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/494,014 filed 27 May 2011, which is hereby incorporated byreference in its entirety.

TECHNICAL FIELD

The present invention relates to image processing and, moreparticularly, to the encoding and decoding of image and video signalsemploying metadata, and more particularly, in various layers ofmetadata.

BACKGROUND

Known scalable video encoding and decoding techniques allow for theexpansion or contraction of video quality, depending on the capabilitiesof the target video display and the quality of the source video data.

Improvements in image and/or video rendering and the experience to theviewers may be made, however, in the use and application of imagemetadata in either a single level or in various levels of metadata.

SUMMARY

Several embodiments of scalable image processing systems and methods aredisclosed herein whereby color management processing of source imagedata to be displayed on a target display is changed according to varyinglevels of metadata.

In one embodiment, a method for processing and rendering image data on atarget display through a set of levels of metadata is disclosed whereinthe metadata is associated with the image content. The method comprisesinputting the image data; ascertaining the set of levels of metadataassociated with the image data; if no metadata is associated with theimage data, performing at least one of a group of image processingsteps, said group comprising: switching to default values and adaptivelycalculating parameter values; if metadata is associated with the imagedata, calculating color management algorithm parameters according to setof levels of metadata associated with the image data.

In yet another embodiment, a system for decoding and rendering imagedata on a target display through a set of levels of metadata isdisclosed. The system comprises: a video decoder, said video decoderreceiving input image data and outputting intermediate image data; ametadata decoder, said metadata decoder receiving input image datawherein said metadata decoder capable of detecting a set of levels ofmetadata associated with said input image data and outputtingintermediate metadata; a color management module, said color managementmodule receiving intermediate metadata from said metadata decoder,receiving intermediate image data from said video decoder, andperforming image processing upon intermediate image data based upon saidintermediate metadata; and a target display, said target displayreceiving and displaying the image data from said color managementmodule.

Other features and advantages of the present system are presented belowin the Detailed Description when read in connection with the drawingspresented within this application.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments are illustrated in referenced figures of thedrawings. It is intended that the embodiments and figures disclosedherein are to be considered illustrative rather than restrictive.

FIGS. 1A, 1B and 1C show one embodiment of a current video pipeline fromcreation, to distribution, to consumption of a video signal.

FIG. 2A depicts one embodiment of a video pipeline that comprises ametadata pipeline in accordance with the teachings of the presentapplication.

FIG. 2B depicts one embodiment of a metadata prediction block.

FIG. 3 shows one embodiment of a sigmoidal curve that employs Level 1metadata.

FIG. 4 shows one embodiment of a sigmoidal curve that employs Level 2metadata.

FIG. 5 shows one embodiment of a histogram plot based on image/sceneanalysis that may be used to adjust the image/video mapping onto atarget display.

FIG. 6 shows one embodiment of an adjusted image/video mapping based onLevel 3 metadata that includes a second reference display grading of theimage/video data.

FIG. 7 shows one embodiment of a linear mapping that might occur if thetarget display is a substantially good match to the second referencedisplay used to color grade the image/video data.

FIG. 8 is one embodiment of a video/metadata pipeline, made inaccordance with the principles of the present application.

DETAILED DESCRIPTION

Throughout the following description, specific details are set forth inorder to provide a more thorough understanding to persons skilled in theart. However, well known elements may not have been shown or describedin detail to avoid unnecessarily obscuring the disclosure. Accordingly,the description and drawings are to be regarded in an illustrative,rather than a restrictive, sense.

Overview

One aspect of video quality concerns itself with having images or videorendered on a target display with the same or substantially the samefidelity as it was intended by the creator of the images or video. It isdesirable to have a Color Management (CM) scheme that tries to maintainthe original appearance of video content on displays with differingcapabilities. In order to accomplish this task, it might be desirablethat such a CM algorithm be able to predict how the video appeared toviewers in the post production environment where it was finalized.

To illustrate the issues germane to the present application and system,FIGS. 1A, 113 and 1C depict one embodiment of a current video pipeline100 that follows a video signal from creation, to distribution and toconsumption of that video signal.

Creation 102 of video signal may occur with the video signal being colorgraded 104 by a color grader 106 who may grade the signal for variousimage characteristics—e.g. luminance, contrast, color rendering of aninput video signal. Color grader 106 may grade the signal to produceimage/video mapping 108 and such grading may be done to a referencedisplay device 110 that may have, for example, a gamma response curve112.

Once the signal has been graded, the video signal may be sent through adistribution 114—where such distribution should be proper conceived ofbroadly. For example, distribution could be via the internet, DVD, movietheatre showings and the like. In the present case, the distribution isshown in FIG. 1A as taking the signal to a target display 120 of maximumluminance of 100 nits and having gamma response curve 124. Assuming thatthe reference display 110 had substantially the same maximum luminanceas the target display and substantially the same response curve, thenthe mapping applied to the video signal may be as simple as a 1:1mapping 122—and made in accordance with, for example, Rec 709 STD forcolor management 118. Holding all other factors equal (like for example,ambient light conditions at the target display), then what one might seeat the reference display is substantially what you would see at thetarget display.

This situation may change, for example, as shown in FIG. 1B, wherein thetarget display 130 differs from the reference display 110 in severalaspects —e.g. maximum luminance (500 nits as opposed to 100 nits for thereference display). In this case, the mapping 132 might be a 1:5 mappingto render on the target display. In such a case, the mapping is a linearstretch through the Rec 709 CM block. Any potential distortion fromreference display viewing to target display viewing may or may not beobjectionable to the viewer, depending on levels of individualdiscrimination. For example, the darks and mid-tones are stretched butmight be acceptable. In addition, it may make MPEG blocking artifactsmore significant.

FIG. 1C shows a more extreme example. Here the target display 140 mayhave more significant differences from the reference display. Forexample, target display 140 has maximum luminance of 1000 nits—asopposed to 100 nits for the reference display. If the same linearstretch mapping 142 were to be applied to the video signal going to thetarget display, then much more noticeable, and objectionable,distortions may be present for the viewer. For example, the videocontent may be displayed at a significantly higher luminance level (1:10ratio). The darks and mid-tones may be stretched to a point where cameranoise of the original capture is noticeable, and banding in the darkareas of the image becomes more significant. In addition, the MPEGblocking artifacts may be more significant.

Without exploring exhaustively all possible examples of howobjectionable artifacts may appear to the viewer, it may be instructiveto discuss a few more. For example, supposing that the reference displayhad a larger maximum luminance (say, 600 nits) than the target display(say, 100 nits). In this case, if the mapping is again a 6:1 linearstretch, then the content may be displayed at an overall lower luminancelevel and the image may appear to be dark and the dark detail of theimage may have a noticeable crush.

In yet another example, suppose the reference display has a different inmaximum luminance (say 600 nits) to the target display (say 1000 nits).Applying a linear stretch, even though there may be only a small ratiodifference (that is, close to 1:2), the magnitude difference in maximumluminance is potentially large and objectionable. Due to the magnitudedifference, the image may be far too bright and might be uncomfortableto watch. The mid-tones may be stretched unnaturally and might appear tobe washed out. In addition, both camera noise and compression noise maybe noticeable and objectionable. In yet another example, suppose thereference display has a color gamut equal to P3 and the target displayhas a gamut that is smaller than REC. 709. Assume the content was colorgraded on the reference display but the rendered content has a gamutequivalent to the target display. In this case, mapping the content fromthe reference display gamut to the target gamut might unnecessarilycompress the content and desaturate the appearance.

Without some sort of intelligent (or at least more accurate) model ofimage rendering on a target display, it is likely that some distortionor objectionable artifacts will be apparent for the viewer of theimages/video. In fact, it is likely that what the viewer experiences isnot what was intended by the creator of images/video. While thediscussion has focused on luminance, it would be appreciated that thesame concerns would also apply to color. In fact, if there is adifference in the source display's color space and the target display'scolor space and that difference is not properly accounted for, thencolor distortion would be a noticeable artifact as well. The sameconcept holds for any differences in the ambient environment between thesource display and the target display.

Use of Metadata

As these examples set out, it may be desirable to have an understandingas to the nature and capabilities of the reference display, targetdisplay and source content in order to create as high a fidelity to theoriginally intended video as possible. There are other data—thatdescribes aspects, and conveys information, of the raw image data,called “metadata” that is useful in such faithful renderings.

While tone and gamut mappers generally perform adequately for roughly80-95% of the images processed for a particular display, there areissues using such generic solutions to process the images. Typically,these methods do not guarantee the image displayed on the screen matchesthe intent of the director or initial creator. It has also been notedthat different tone or gamut mappers may work better with differenttypes of images or better preserve the mood of the images. In addition,it is also noted that different tone and gamut mappers may causeclipping and loss of detail or a shift in color or hue.

When tone-mapping a color-graded image-sequence, the color-gradingparameters such as the content's minimal black level and maximum whitelevel may be desirable parameters to drive the tone-mapping ofcolor-graded content onto a particular display. The color-grader hasalready made the content (on a per image, as well as a temporal basis)look the way he/she prefers. When translating it to a different display,it may be desired to preserve the perceived viewing experience of theimage sequence. It should be appreciated that with increasing levels ofmetadata, it may be possible to improve such preservation of theappearance.

For example, assume that a sunrise sequence has been filmed, andcolor-graded by a professional on a 1000 nit reference display. In thisexample, the content is to be mapped for display on a 200 nit display.The images before the sun rises may not be using the whole range of thereference display (e.g. 200 nits max). As soon as the sun rises, theimage sequence could use the whole 1000 nit range, which is the maximumof the content. Without metadata, many tone-mappers use the maximumvalue (such as luminance) as a guideline for how to map content. Thus,the tone-curves applied to the pre-sunrise images (a 1:1 mapping) may bedifferent than the tone-curves applied to the post-sunrise images (a5×tone compression). The resulting images shown on the target displaymay have the same peak luminance before and after the sunrise, which isa distortion of the creative intent. The artist intended for the imageto be darker before the sunrise and brighter during, as it was producedon the reference display. In this scenario, metadata may be defined thatfully describes the dynamic range of the scene; and the use of thatmetadata may ensure that the artistic effect is maintained. It may alsobe used to minimize luminance temporal issues from scene to scene.

For yet another example, consider the reverse of the above-givensituation. Assume that Scene 1 is graded for 350 nits and that Scene 1is filmed in outdoor natural light. If Scene 2 is filmed in a darkenedroom, and shown in the same range, then Scene 2 would appear to be toodark. The use of metadata in this case could be used to define theproper tone curve and ensure that Scene 2 is appropriately visible. Inyet another example, suppose the reference display has a color gamutequal to P3 and the target display has a gamut that is smaller than REC.709. Assume the content was color graded on the reference display butthe rendered content has a gamut equivalent to the target display. Theuse of metadata that defines the gamut of the content and the gamut ofthe source display may enable the mapping to make an intelligentdecision and map the content gamut 1:1. This may ensure the contentcolor saturation remains intact.

In certain embodiments of the present system, tone and gamut need not betreated as separate entities or conditions of a set of images/video.“Memory colors” are colors in an image that, even though a viewer maynot aware of the initial intent, they will look wrong if adjustedincorrectly. Skin tones, sky, and grass are good examples of a memorycolor that, when tone mapped, their hue might be changed to look wrong.In one embodiment, the gamut mapper has knowledge of a protected color(as metadata) in an image to ensure its hue is maintained during thetone mapping process. The use of this metadata may define and highlightprotected colors in the image to ensure correct handling of memorycolors. The ability to define localized tone and gamut mapper parametersis an example of metadata that is not necessarily a mere product of thereference and/or target display parameters.

One Embodiment of a Robust Color Management

In several embodiments of the present application, systems and methodsfor providing a robust color management scheme is disclosed, wherebyseveral sources of metadata are employed to provide better image/videofidelity that matches the original intent of the content creator. In oneembodiment, various sources of metadata may be added to the processing,according the availability of certain metadata, as will be discussed ingreater detail herein.

As merely one exemplary, FIG. 2A shows a high level block diagram of animage/video pipeline 200 that employs metadata. Image creation andpost-production may take place in block 202. Video source 208 is inputinto video encoder 210. Captured as well as video source, metadata 204is input into a metadata encoder 206. Examples of metadata 204 have beenpreviously discussed; but may include such items as gamut boundaries andother parameters of the source and/or reference display, the environmentof the reference display and other encoding parameters. In oneembodiment, the metadata accompanies the video signals, as a subset ofmetadata might be co-located temporally and spatially with the videosignals that are intended to be rendered at a given time. Together,metadata encoder 206 and video encoder 210 may be considered the sourceimage encoder.

Video signal and metadata are then distributed via distribution 212—inany suitable manner—e.g. multiplexed, serial, parallel or by some otherknown scheme. It should be appreciated that distribution 212 should beconceived of broadly for the purposes of the present application.Suitable distribution schemes might include: internet, DVD, cable,satellite, wireless, wired or the like.

Video signals and metadata, thus distributed, are input into a targetdisplay environment 220. Metadata and video decoders, 222 and 224respectively, receive their respective data streams and provide decodingappropriate for the characteristics of the target display, among otherfactors. Metadata at this point might preferably be sent to either athird party Color Management (CM) block 220 and/or to one of theembodiments of a CM module 228 of the present application. In the casethat the video and metadata are processed by CM block 228, CM parametergenerator 232 may take as inputs metadata from metadata decoder 222 aswell as metadata prediction block 230.

Metadata prediction block 230 may make certain predictions of a higherfidelity rendering based upon knowledge of previous images or videoscenes. The metadata prediction block gathers statistics from theincoming video stream in order to estimate metadata parameters. Onepossible embodiment of a metadata prediction block 230 is shown in FIG.2B. In this embodiment, a histogram 262 of the log of the imageluminance may be calculated for each frame. An optional low pass filter260 may precede the histogram in order to (a) reduce sensitivity ofhistogram to noise and/or (b) partially account for natural blur inhuman vision system (e.g. humans perceive a dither pattern as a solidcolor patch). From that the minimum 266, maximum 274 are captured. Thetoe 268 and shoulder 272 points can also be captured based on percentilesettings (like 5% and 95%). The geometric mean 270 (log average) canalso be calculated and used as the mid point. These values may betemporally filtered, so that, e.g., they do not jerk around too quickly.These values may also be reset during a scene change if desired. Scenechanges may be detected from black frame insertion or extreme radicaljumps in the histogram or any other such technique. It will beappreciated that the scene change detector 264 could detect scenechanges from either histogram data, as shown, or from the video datadirectly.

In yet another embodiment, the system might compute the mean of theimage intensity values (luminance). Image intensity may then be scaledby a perceptual weighting, such as log, power function, or a LUT. Thesystem might then estimate the highlight and shadow regions (e.g.headroom and footroom on FIG. 5) from pre-determined percentiles of theimage histogram (for example 10% and 90%). Alternatively, the system mayestimate the highlight and shadow regions from when the slope of thehistogram is above or below a certain threshold. Many variations arepossible—for example, the system may calculate the maximum and minimumvalues of the input image, or from pre-defined percentiles (for example1% and 99%).

In other embodiments, the values may be stabilized over time (e.g. frameto frame), such as with a fixed rise and fall rate. Sudden changes maybe indicative of a scene change, so the values might be exempt fromtime-stabilization. For example, if the change is below a certainthreshold, the system might limit the rate of change, otherwise, go withthe new value. Alternatively, the system may reject certain values frominfluencing the shape of the histogram (such as letterbox, or zerovalues).

In addition, CM parameter generator 232 could take other metadata (i.e.not necessarily based on content creation) such as display parameters,the ambient display environment and user preferences to factor into thecolor management of the images/video data. It will be appreciated thatdisplay parameters could be made available to CM parameter generator 232by standard interfaces, e.g. EDID or the like via interfaces (such asDDC serial interfaces, HDMI, DVI or the like). In addition, ambientdisplay environment data may be supplied by ambient light sensors (notshown) that measure the ambient light conditions or reflectance of suchfrom the target display.

Having received any appropriate metadata, CM parameter generator 232 mayset parameters in a downstream CM algorithm 234 which may concern itselfwith the final mapping of image/video data upon the target display 236.It should be appreciated that there does not need to be a bifurcation offunctions as shown between CM parameter generator 232 and CM algorithm234. In fact, in some embodiments, these features may be combined in oneblock.

Likewise, it will be appreciated the various blocks forming FIGS. 2A and2B are optional from the standpoint of the present embodiment and thatit is possible to design many other embodiments—with or without theserecited blocks—that are within the scope of the present application. Inaddition, CM processing may take place at different points in the imagepipeline 200 and not necessarily as depicted in FIG. 2A. For example, CMof the target display may be placed and contained within the targetdisplay itself, or such processing may be performed in a set top box.Alternatively, depending on what level of metadata processing isavailable or deemed appropriate, CM of the target display could takeplace in the distribution or at the point of post-production.

Scalable Color Management Using Varying Levels of Metadata

In several embodiments of the present application, systems and methodsfor providing a scalable color management scheme is disclosed, wherebythe several sources of metadata may be arranged in a set of varyinglevels of metadata to provide an even higher level of image/videofidelity to the original intent of the content creator. In oneembodiment, various levels of metadata may be added to the processing,according the availability of certain metadata, as will be discussed ingreater detail herein.

In many embodiments of the present system, suitable metadata algorithmsmay consider a plethora of information, such as, for example:

-   -   (1) the encoded video content,    -   (2) a method for converting the encoded content into linear        light    -   (3) the gamut boundaries (both luminance and chromaticity) of        the source content, and    -   (4) information on the post-production environment.

The method for converting to linear light may be desirable so that theappearance (luminance, color gamut, etc.) of the actual image observedby the content creators can be calculated. The gamut boundaries aid inspecifying in advance what the outer-most colors may be, so that suchouter-most colors may be mapped into the target display without clippingor leaving too much overhead. The information on the post productionenvironment may be desirable so that any external factors that thatcould influence the appearance of the display might be modeled.

In current video distribution mechanisms, only the encoded video contentis provided to a target display. It is assumed that the content has beenproduced in a reference studio environment using reference displayscompliant with Rec. 601/709 and various SMPTE standards. The targetdisplay system is typically assumed to comply with Rec. 601/709—and thetarget display environment is largely ignored. Because of the underlyingassumption that the post-production display and target display will bothcomply with Rec. 601/709, neither of the displays may be upgradedwithout introducing some level of image distortion. In fact, as Rec. 601and Rec. 709 differ slightly in their choice of primaries, somedistortion may have already been introduced.

One embodiment of a scalable system of metadata levels is disclosedherein that enables the use of reference and target displays with awider and enhanced range of capabilities. The various metadata levelsenable a CM algorithm to tailor source content for a given targetdisplay with increasing levels of accuracy. The following sectionsdescribe the levels of metadata proposed:

Level 0

Level 0 metadata is the default case and essentially means zerometadata. Metadata may be absent for a number of reasons including:

-   -   (1) Content creators did not include it (or it was lost at some        point in the post-production pipeline)    -   (2) Display switches between content (i.e. channel surfing or        commercial break)    -   (3) Data corruption or loss.

In one embodiment, it may be desirable that CM processing handle Level 0(i.e. where no metadata is present) either by estimating it based onvideo analysis or by assuming default values.

In such an embodiment, Color Management algorithms may be able tooperate in the absence of metadata in at least two different ways:

Switch to Default Values

In this case a display would operate much like today's distributionsystem where the characteristics of the post production referencedisplay are assumed. Depending on the video encoding format, the assumedreference display could potentially be different. For example, a Rec.601/709 display could be assumed for 8 bit RGB data. If color graded ona professional monitor (such as a ProMonitor) in 600 nit mode, P3 or Rec709 gamut could be assumed for higher bit depth RGB data or LogYuvencoded data. This might work well if there is only one standard or a defacto standard for higher dynamic range content. However, if the higherdynamic range content is created under custom conditions, the resultsmay not be greatly improved and may be poor.

Adaptively Calculate Parameter Values

In this case, the CM algorithm might start with some default assumptionsand refine those assumptions based on information gained from analyzingthe source content. Typically, this might involve analyzing thehistogram of the video frames to determine how to best adjust theluminance of the incoming source, possibly by calculating parametervalues for a CM algorithm. In doing so, there may be a risk in that itmay produce an ‘auto exposure’ type of look to the video where eachscene or frame is balanced to the same luminance level. In addition,some formats may present some other challenges—for example, there iscurrently no automated way to determine the color gamut if the sourcecontent is in RGB format.

In another embodiment, it is possible to implement a combination of thetwo approaches. For example, gamut and encoding parameters (like gamma)could be assumed to be a standardized default value and a histogramcould be used to adjust the luminance levels.

Level 1

In the present embodiment, Level 1 metadata provides informationdescribing how the source content was created and packaged. This datamay allow CM processing to predict how the video content actuallyappeared to the content producers. The Level 1 metadata parameters maybe grouped into three areas:

-   -   (1) video encoding parameters,    -   (2) source display parameters,    -   (3) source content gamut parameters, and    -   (4) environmental parameters.

Video Encoding Parameters

As most Color Management algorithms work at least partially in a linearlight space, it may be desirable to have a method to convert the encodedvideo to a linear (but relative) (X,Y,Z) representation—either inherentin the encoding scheme or provided as metadata itself. For example,encoding schemes, such as LogYuv, OpenEXR, LogYxy or LogLuv TIFF,inherently contain the information necessary to convert to a linearlight format. However, for many RGB or YCbCr formats, additionalinformation such as gamma and color primaries may be desired. As anexample, to process YCbCr or RGB input, the following pieces ofinformation may be supplied:

-   -   (1) the coordinates of the primaries and white point used for        encoding the source content. This may be used to generate the        RGB to XYZ color space transform matrix. (x,y) coordinates for        each of red, green, blue, and white.    -   (2) the minimum and maximum code values (eg. ‘standard’ or        ‘full’ range). This may be used to convert code values into        normalized input values.    -   (3) the global or per-channel response curve for each primary        (eg. ‘gamma’). This may be used to linearize the intensity        values by undoing any non-linear response that may have been        applied by the interface or the reference display.

Source Display Gamut Parameters

It may be useful for the Color Management algorithms to know the colorgamut of the source display. These values correspond to the capabilitiesof the reference display used to grade the content. The source displaygamut parameters, measured preferably in a completely dark environment,might include:

-   -   (1) Primaries, such as provided as CIE x,y chromaticity        coordinates with maximum luminance specified or XYZ.    -   (2) Tristimulus value for white and black, such as CIE XYZ.

Source Content Gamut Parameters

It may be useful for the Color Management algorithms to know the boundsof the color gamut used in generating the source content. Typically,these values correspond to the capabilities of the reference displayused to grade the content; however, they may be different due tosoftware settings—or if only a subset of the display's capabilities wereused. In some cases, the gamut of the source content may not match thegamut of the encoded video data. For example, the video data may beencoded in LogYuv (or the some other encoding) which encompasses theentire visual spectrum. The source gamut parameters might include:

-   -   (1) Primaries, such as provided as CIE x,y chromaticity        coordinates with maximum luminance specified or XYZ    -   (2) Tristimulus value for white and black, such as CIE XYZ.

Environmental Parameters

In certain circumstances, just knowing the light levels produced by thereference display may not be enough to determine how the source content‘appeared’ to viewers in post production. Information regarding thelight levels produced by the ambient environment may also be useful. Thecombination of both display and environmental light is the signal thatstrikes the human eye and creates an “appearance”. It may be desired topreserve this appearance through the video pipeline. The environmentalparameters, preferably measured in the normal color grading environment,might include:

-   -   (1) Reference monitor surround color provided as absolute XYZ        value. The viewer's level of adaptation to their environment may        be estimated using this value.    -   (2) Absolute XYZ value for the black level of the reference        monitor in the normal color grading environment. The impact of        the ambient lighting on the black level might be determined        using this value.    -   (3) Color temperature of the ambient light provided as absolute        XYZ value of white reflective sample on front of screen (like        paper). The viewer's white point adaptation may be estimated        using this value.

As noted, Level 1 metadata may provide the gamut, encoding andenvironmental parameters for the source content. This may allow the CMsolution to predict how the source content appeared when approved.However, it may not provide much guidance on how to best adjust thecolors and luminance to suit the target display.

In one embodiment, a single sigmoidal curve applied globally to videoframes in RGB space may be a simple and stable way of mapping betweendifferent source and target dynamic ranges. Additionally, a singlesigmoidal curve may be used to modify each channel (R, G, B)independently. Such a curve might also be sigmoidal in some perceptualspace, such as log or power functions. An example curve 300 is shown inFIG. 3. It will be appreciated that other mapping curves would also besuitable, such as a linear map (as shown in FIGS. 3, 4 and 6), or othermaps, such as gamma.

In this case, the minimum and maximum points on the curve are known fromthe Level 1 metadata and information on the target display. The exactshape of the curve could be static and one that has been found to workwell on average based on the input and output range. It could also bemodified adaptively based on the source content.

Level 2

Level 2 metadata provides additional information about thecharacteristics of the source video content. In one embodiment, Level 2metadata may divide the luminance range of the source content intospecific luminance regions. More specifically, one embodiment mightbreak the luminance range of the source content into five regions, wherethe regions may be defined by points along the luminance range. Suchranges and regions may be defined by one image, a set of images, onevideo scene or a plurality of video scenes.

For the sake of exposition, FIGS. 4 and 5 depict one embodiment of theuse of Level 2 metadata. FIG. 4 is a mapping 400 of input luminance tooutput luminance on the target display. Mapping 400 is depicted here asa substantially sigmoidal curve comprising a set of break points alongits curve. These points may correspond to image processing relevantvalues—labeled there as: min_(in), foot_(in), mid_(in), head_(in), andmax_(in).

In this embodiment, min_(in), and max_(in) may correspond to the minimumand maximum luminance values for a scene. The third point, mid_(in), maybe the middle value which corresponds to a perceptually ‘average’luminance value or ‘middle grey’. The final two points, foot_(in) andhead_(in), may be the footroom and headroom values. The region betweenthe footroom and headroom values may define an important portion of thescene's dynamic range. It may be desirable that content between thesepoints should be preserved as much as possible. Content below thefootroom may be crushed if desired. Content above the headroomcorresponds to highlights and may be clipped if desired. It should beappreciated that these points tend to define a curve itself, so anotherembodiment might be a best fit curve to these points. Additionally, sucha curve might assume linear, gamma, sigmoidal or any other suitableand/or desirable shape.

Further to this embodiment, FIG. 5 illustrates the minimum, footroom,middle, headroom and maximum points as shown on a histogram plot 500.Such a histogram may be produced on an image basis, video scene basis oreven a set of video scene basis, depending on which level of granularityof the histogram analysis is desired to help preserve content fidelity.In one embodiment, the five points may be specified in code values, inthe same encoding representation as the video data. Note that the minand max may typically correspond to the same values as the range of thevideo signal, but not always.

Depending on the granularity and frequency of such histogram plots, thehistogram analysis may be used to redefine points along the luminancemap of FIG. 4 on a dynamic basis—and hence alter the curve over time.This may also be helpful to improve content fidelity as displayed to theviewer on a target display. For example, in one embodiment, passing thehistogram along periodically allows the decoder to potentially derivemore information than just the min, max, etc. The encoder might alsoonly include a new histogram when there was a significant change. Thismight save the decoder the effort of calculating it for each frame onthe fly. In yet another embodiment, the histogram might be used toestimate metadata—to either replace missing metadata or to supplementexisting metadata.

Level 3

In one embodiment, for level 3 metadata, the Level 1 and Level 2metadata parameters may be employed for a second reference grading ofthe source content. For example, the primary grade of the source contentmay have been performed on a reference monitor (e.g. ProMonitor) using aP3 gamut at 600 nits luminance. With Level 3 metadata, information on asecondary grading might be performed, for example, on a CRT referencecould be provided as well. In this case, the additional informationwould indicate Rec. 601 or Rec. 709 primaries and a lower luminance like120 nits. The corresponding min, foot, mid, head, and max levels wouldalso be provided to the CM algorithm.

Level 3 metadata may add additional data—e.g. gamut, environment,primaries, etc. and luminance level information for a second referencegrading of the source content. This additional information may then becombined to define a sigmoidal curve 600 (as shown in FIG. 6) that willmap the primary input to the range of the reference display. FIG. 6shows an example of how the input and reference display (output) levelscan be combined to form a suitable mapping curve.

If the target display's capabilities are a good match for the secondaryreference display then this curve can be used directly for mapping theprimary source content. However, if the target display's capabilitiessit somewhere between that of the primary and secondary referencedisplay, then the mapping curve for the secondary display can be used asa lower bound. The curve used for actual target display can then be aninterpolation between no reduction (e.g. a linear mapping 700 as shownin FIG. 7) and the full range reduction curve generated using thereference levels.

Level 4

Level 4 metadata is the same as Level 3 metadata except that themetadata for the second reference grading is tailored to the actualtarget display.

Level 4 metadata could also be implemented in an over-the-top (OTT)scenario (i.e. Netflix, mobile streaming or some other VOD service)where the actual target display sends its characteristics to the contentprovider and the content is distributed with most suitable curvesavailable. In one such embodiment, the target display may be incommunication with the video streaming service, VOD service or the like,and the target display may sent to the streaming service information,such as its EDID data or any other suitable metadata available. Suchcommunication path is depicted as the dotted line path 240 in FIG. 2A,to either video and/or the metadata encoder (210 and 206respectively)—as is known in the art with services like Netflix and thelike. Typically, Netflix and other such VOD services are monitoring theamount of data and speed of data throughput to the target device—and notnecessarily metadata for color management purposes. It is sufficient forthe purposes of this present embodiment, though, that metadata be sentfrom the target data to the creation or post-production—either throughthe distribution 212 or otherwise (in realtime or a priori)—to changethe color, tone or other characteristics of the image data to bedelivered to the target display.

With Level 4 metadata the reference luminance levels provided arespecifically for the target display. In this case, a sigmoidal curvecould be constructed as shown in FIG. 6 and used directly without anyinterpolation or adjustment.

Level 5

Level 5 metadata enhances Level 3 or Level 4 with identification ofsalient features such as the following:

-   -   (1) Protected colors—colors in the image that have been        identified as common memory colors that should not be processed        such as skin tones, color of the sky, grass, etc. Such regions        of an image having protected colors may have their image data        passed onto the target display without alteration.    -   (2) Salient highlights—identify light sources, maximum emissive        and specular highlights    -   (3) Out of gamut color—features in the image that have been        purposely color graded outside the gamut of the source content.

In some embodiments, if the target display is capable of higherluminance, these identify objects could be artificial mapped to themaximum of the display. If the target display is capable of lowerluminance, these objects could be clipped to the display maximum whilenot compensating for detail. These object might then be ignored and themapping curve defined might be applied to the remaining content andmaintain higher amounts of detail.

It should be also appreciated that, in some embodiments, e.g. in thecase of trying to map VDR down to a lower dynamic range display, it maybe useful to know the light sources and highlights because one mightclip them without doing too much harm. For one example, a brightly litface on the other hand (i.e. definitely not a light source) may not be afeature that it is desirable to clip. Alternatively, such a featuremight be compressed more gradually. In yet another embodiment, if thetarget display is capable of a wider gamut, these content object may beexpanded and to the full capabilities of the display. Additionally, inanother embodiment, the system might ignore any mapping curve defined toensure a highly saturated color.

It should be appreciated that in several embodiments of the presentapplication, the levels themselves may not be a strict hierarchy ofmetadata processing. For example, Level 5 could apply to either Level 3or Level 4 data. In addition, some lower numbered levels may not bepresent; yet the system may process higher numbered levels, if present.

One Embodiment of a System Employing Multiple Metadata Levels

As discussed above, the varying metadata levels provide increasinginformation about the source material that allows a CM algorithm toprovide increasingly accurate mappings for a target display. Oneembodiment that employs such scalable and varying levels of metadata isshown in FIG. 8.

System 800, as depicted, shows an entire video/metadata pipeline throughfive blocks—creation 802, container 808, encoding/distribution 814,decoding 822 and consumption 834. It will be appreciated that manyvariations of different implementations are possible—some having moreblocks and some less blocks. The scope of the present application shouldnot be limited to the recitation of the embodiments herein and, in fact,the scope of the present application encompasses these variousimplementations and embodiments.

Creation 802 broadly takes image/video content 804 and processes it, aspreviously discussed, through a color grading tool 806. Processed videoand metadata is placed in a suitable container 810—e.g. any suitableformat or data structure that is known in the art for subsequentdissemination. For one example, video may be stored and sent as VDRcolor graded video and metadata as VDR XML formatted metadata. Thismetadata, as shown in 812, is partitioned in the various Levelspreviously discussed. In the container block, it is possible to embeddata into the formatted metadata that encodes to which levels ofmetadata are available and associated with the image/video data. Itshould be appreciated that not all levels of metadata need to beassociated with the image/video data; but that whatever metadata andlevel is associated, the decoding and rendering downstream may be ableto ascertain and process such available metadata appropriately.

Encoding may proceed by taking the metadata and providing it toalgorithm parameter determination block 816, while the video may beprovided to AVCVDR encoder 818—which may also comprise CM block forprocessing video prior to distribution 820.

Once distributed (as thought broadly via, e.g. Internet, DVD, cable,satellite, wireless, wired or the like), decoding of the video/metadatadata may proceed to AVCVDR decoder 824 (or optionally to a legacydecoder 826, if the target display is not VDR enabled). Both video dataand metadata are recovered from decoding (as block 830, 828respectively—and possibly 832, if target display is legacy). Decoder 824may take input image/video data and recover and/or split out the inputimage data into an image/video data stream to be further processed andrendered, and a metadata stream for calculating parameters for later CMalgorithm processing on the image/video data stream to be rendered. Themetadata stream should also contain information as whether there is anymetadata associated with the image/video data stream. If no metadata isassociated, then the system may proceed with Level 0 processing asdiscussed above. Otherwise, the system may proceed with furtherprocessing according to whatever metadata is associated with theimage/video data stream, as discussed above according to a set ofvarying levels of metadata.

It will be appreciated that whether there is or is not any metadataassociated with the image/video data to be rendered may be determined inreal-time. For example, it may be possible that for some sections of avideo stream that no metadata is associated with those section (whetherthrough data corruption or that the content creator intended that therebe no metadata)—and in other sections there may be metadata, perhaps arich set of metadata with varying levels, are now available andassociated with other sections of the video stream. This may beintentional on the part of the content creator; but at least oneembodiment of the present application should be able to make suchdeterminations as to whether any, or what level, of metadata isassociated with the video stream on a real-time or substantially dynamicbasis.

In the consumption block, algorithm parameter determination block 836can either recover the previous parameters perhaps done prior todistribution or may recalculate parameters based on metadata from thetarget display and/or target environment (perhaps from standardinterfaces, e.g. EDID or emerging VDR interfaces as well as input fromthe viewer or sensors in the target environment—as discussed previouslyin the context of the embodiment of FIGS. 24 and/or 2B). Once theparameters have been calculated or recovered, they may be send to one ormore of the CM systems (838, 840 and/or 840) for final mapping of sourceand intermediate image/video data onto the target display 844 inaccordance with the several embodiments disclosed herein.

In other embodiments, the implementation blocks of FIG. 8 need notfinely divided. For example, and broadly speaking, the processinginvolving the algorithm parameter determination and the color managementalgorithm itself need not necessarily be bifurcated as shown in FIG. 8;but may be conceived of, and/or implemented as, a color managementmodule.

In addition, while it has been described herein a set of varying levelsof metadata for use by a video/image pipeline, it should be appreciatedthat, in practice, the system does not need to process the image/videodata in exacting order as the Levels of metadata are numbered. In fact,it may be the case that some levels of metadata are available at thetime of rendering, and other levels not available. For example, a secondreference color grading may or may not be performed and that Level 3metadata may or may not be present at the time of rendering. A systemmade in accordance with the present application takes the presence orabsence of the different levels of metadata into consideration andcontinues with the best metadata processing as is possible at the time.

A detailed description of one or more embodiments of the invention, readalong with accompanying figures, that illustrate the principles of theinvention has now been given. It is to be appreciated that the inventionis described in connection with such embodiments, but the invention isnot limited to any embodiment. The scope of the invention is limitedonly by the claims and the invention encompasses numerous alternatives,modifications and equivalents. Numerous specific details have been setforth in this description in order to provide a thorough understandingof the invention. These details are provided for the purpose of exampleand the invention may be practiced according to the claims without someor all of these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

1. (canceled)
 2. A method for processing image data through metadataassociated with the image data, said method comprising: receiving theimage data as a bitstream at a destination device; decoding the imagedata; determining, by the destination device, if a first set of metadataassociated with the image data is received, wherein the first set ofmetadata includes a representation of parameters of a reference display;and determining, by the destination device, if a second set of metadatafor video content characteristics of the image data is received, thesecond set of metadata including at least a maximum luminance level ofthe image data, wherein the first set of metadata includes: a. a whitepoint, represented as x, y chromaticity coordinates for the referencedisplay, b. three primaries, each represented as x, y chromaticitycoordinates for the reference display, c. a minimum luminance level forthe reference display, and d. a maximum luminance level for thereference display, and wherein determining if the second set of metadatais received is independent of determining if the first set of metadatais received.
 3. The method of claim 2, wherein the first set of metadataand the second set of metadata are received in the bitstream.
 4. Themethod of claim 3, wherein the first set of metadata and the second setof metadata are separately partitioned in the bitstream.
 5. The methodof claim 2, further comprising receiving, by the destination device,metadata characterizing ambient light conditions.
 6. An apparatuscomprising: at least one non-transitory memory; and a bitstream storedon the at least one non-transitory memory, the bitstream including afirst set of metadata and a second set of metadata, wherein the firstset of metadata and the second set of metadata are separatelypartitioned in the bitstream; wherein the first set of metadata includesat least: a. a white point, represented as x, y chromaticity coordinatesfor a reference display, b. three primaries, each represented as x, ychromaticity coordinates for the reference display, c. a minimumluminance level for the reference display, and d. a maximum luminancelevel for the reference display; wherein the second set of metadataincludes at least a maximum luminance level of the image data.